Search CORE

297 research outputs found

Reducing the Space Requirement of Suffix Trees

Author: Stefan Kurtz
Publication venue
Publication date: 01/01/1998
Field of study

We show that suffix trees store various kinds of redundant information. We exploit these redundancies to obtain more space efficient representations. The most space efficient of our representations requires 20 bytes per input character in the worst case, and 10.1 bytes per input character on average for a collection of 42 files of different type. This is an advantage of more than 8 bytes per input character over previous work. Our representations can be constructed without extra space, and as fast as previous representations. The asymptotic running times of suffix tree applications are retained. Copyright © 1999 John Wiley & Sons, Ltd. KEY WORDS: data structures; suffix trees; implementation techniques; space reductio

CiteSeerX

Comparative genomics of Arabidopsis and maize: prospects and limitations

Author: Brendel Volker
Kurtz Stefan
Walbot Virginia
Publication venue: BioMed Central
Publication date: 01/01/2002
Field of study

The completed Arabidopsis genome seems to be of limited value as a model for maize genomics. In addition to the expansion of repetitive sequences in maize and the lack of genomic micro-colinearity, maize-specific or highly-diverged proteins contribute to a predicted maize proteome of about 50,000 proteins, twice the size of that of Arabidopsis

PubMed Central

Publications at Bielefeld University

Efficient implementation of lazy suffix trees

Author: Giegerich Robert
Kurtz Stefan
Stoye Jens
Publication venue: 'Wiley'
Publication date: 01/01/2003
Field of study

Giegerich R, Kurtz S, Stoye J. Efficient implementation of lazy suffix trees. SOFTWARE-PRACTICE & EXPERIENCE. 2003;33(11):1035-1049.We present an efficient implementation of a write-only top-down construction for suffix trees. Our implementation is based on a new, space-efficient representation of suffix trees that requires only 12 bytes per input character in the worst case, and 8.5 bytes per input character on average for a collection of files of different type. We show how to efficiently implement the lazy evaluation of suffix trees such that a subtree is evaluated only when it is traversed for the first time. Our experiments show that for the problem of searching many exact patterns in a fixed input string, the lazy top-down construction is often faster and more space efficient than other methods. Copyright (C) 2003 John Wiley Sons, Ltd

Publications at Bielefeld University

Efficient computation of absent words in genomic sequences

Author: Giegerich Robert
Herold Julia
Kurtz Stefan
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Herold J, Kurtz S, Giegerich R. Efficient computation of absent words in genomic sequences. BMC Bioinformatics. 2008;9(1): 167.Background: Analysis of sequence composition is a routine task in genome research. Organisms are characterized by their base composition, dinucleotide relative abundance, codon usage, and so on. Unique subsequences are markers of special interest in genome comparison, expression profiling, and genetic engineering. Relative to a random sequence of the same length, unique subsequences are overrepresented in real genomes. Shortest words absent from a genome have been addressed in two recent studies. Results: We describe a new algorithm and software for the computation of absent words. It is more efficient than previous algorithms and easier to use. It directly computes unwords without the need to specify a length estimate. Moreover, it avoids the space requirements of index structures such as suffix trees and suffix arrays. Our implementation is available as an open source package. We compute unwords of human and mouse as well as some other organisms, covering a genome size range from 109 down to 105 bp. Conclusion: The new algorithm computes absent words for the human genome in 10 minutes on standard hardware, using only 2.5 Mb of space. This enables us to perform this type of analysis not only for the largest genomes available so far, but also for the emerging pan- and meta-genome data

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Publications at Bielefeld University

LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons

Author: Ellinghaus David
Kurtz Stefan
Willhoeft Ute
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Transposable elements are abundant in eukaryotic genomes and it is believed that they have a significant impact on the evolution of gene and chromosome structure. While there are several completed eukaryotic genome projects, there are only few high quality genome wide annotations of transposable elements. Therefore, there is a considerable demand for computational identification of transposable elements. LTR retrotransposons, an important subclass of transposable elements, are well suited for computational identification, as they contain long terminal repeats (LTRs). Results We have developed a software tool <it>LTRharvest </it>for the <it>de novo </it>detection of full length LTR retrotransposons in large sequence sets. <it>LTRharvest </it>efficiently delivers high quality annotations based on known LTR transposon features like length, distance, and sequence motifs. A quality validation of <it>LTRharvest </it>against a gold standard annotation for <it>Saccharomyces cerevisae </it>and <it>Drosophila melanogaster </it>shows a sensitivity of up to 90% and 97% and specificity of 100% and 72%, respectively. This is comparable or slightly better than annotations for previous software tools. The main advantage of <it>LTRharvest </it>over previous tools is (a) its ability to efficiently handle large datasets from finished or unfinished genome projects, (b) its flexibility in incorporating known sequence features into the prediction, and (c) its availability as an open source software. Conclusion <it>LTRharvest </it>is an efficient software tool delivering high quality annotation of LTR retrotransposons. It can, for example, process the largest human chromosome in approx. 8 minutes on a Linux PC with 4 GB of memory. Its flexibility and small space and run-time requirements makes <it>LTRharvest </it>a very competitive candidate for future LTR retrotransposon annotation projects. Moreover, the structured design and implementation and the availability as open source provides an excellent base for incorporating novel concepts to further improve prediction of LTR retrotransposons.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Multitrophic interactions among Western Corn Rootworm, Glomus intraradices and microbial communities in the rhizosphere and endorhiza of maize

Author: Dematheis Flavia
Kurtz Benedikt
Smalla Kornelia
Vidal Stefan
Publication venue
Publication date: 01/01/2013
Field of study

The complex interactions among the maize pest Western Corn Rootworm (WCR), Glomus intraradices (GI-recently renamed Rhizophagus intraradices) and the microbial communities in both rhizosphere and endorhiza of maize have been investigated in view of new pest control strategies. In a greenhouse experiment, different maize treatments were established: C (control plants), W (plants inoculated with WCR), G (plants inoculated with GI), GW (plants inoculated with GI and WCR). After 20 days of WCR root feeding, larval fitness was measured. Dominant arbuscular mycorrhizal fungi (AMF) in soil and maize endorhiza were analyzed by cloning of 18S rRNA gene fragments of AMF, restriction fragment length polymorphism and sequencing. Bacterial and fungal communities in the rhizosphere and endorhiza were investigated by denaturing gradient gel electrophoresis of 16S rRNA gene and ITS fragments, PCR amplified from total community DNA, respectively. GI reduced significantly WCR larval development and affected the naturally occurring endorhiza AMF and bacteria. WCR root feeding influenced the endorhiza bacteria as well. GI can be used in integrated pest management programs, rendering WCR larvae more susceptible to predation by natural enemies. The mechanisms behind the interaction between GI and WCR remain unknown. However, our data suggested that GI might act indirectly via plant-mediated mechanisms influencing the endorhiza microbial communities

Institutional Repository of the Freie Universität Berlin

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

FISH Oracle: a web server for flexible visualization of DNA copy number data in a genomic context

Author: Kurtz Stefan
Mader Malte
Simon Ronald
Steinbiss Sascha
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Significant speedup of database searches with HMMs by search space reduction with PSSM family models

Author: Beckstette Michael
Giegerich Robert
Homann Robert
Kurtz Stefan
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Motivation: Profile hidden Markov models (pHMMs) are currently the most popular modeling concept for protein families. They provide sensitive family descriptors, and sequence database searching with pHMMs has become a standard task in today's genome annotation pipelines. On the downside, searching with pHMMs is computationally expensive

CiteSeerX

PubMed Central

Publications at Bielefeld University

Structator: fast index-based search for RNA sequence-structure patterns

Author: Backofen Rolf
Beckstette Michael
Kurtz Stefan
Meyer Fernando
Will Sebastian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2010
Field of study

Background The secondary structure of RNA molecules is intimately related to their function and often more conserved than the sequence. Hence, the important task of searching databases for RNAs requires to match sequence-structure patterns. Unfortunately, current tools for this task have, in the best case, a running time that is only linear in the size of sequence databases. Furthermore, established index data structures for fast sequence matching, like suffix trees or arrays, cannot benefit from the complementarity constraints introduced by the secondary structure of RNAs. Results We present a novel method and readily applicable software for time efficient matching of RNA sequence-structure patterns in sequence databases. Our approach is based on affix arrays, a recently introduced index data structure, preprocessed from the target database. Affix arrays support bidirectional pattern search, which is required for efficiently handling the structural constraints of the pattern. Structural patterns like stem-loops can be matched inside out, such that the loop region is matched first and then the pairing bases on the boundaries are matched consecutively. This allows to exploit base pairing information for search space reduction and leads to an expected running time that is sublinear in the size of the sequence database. The incorporation of a new chaining approach in the search of RNA sequence-structure patterns enables the description of molecules folding into complex secondary structures with multiple ordered patterns. The chaining approach removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our method runs up to two orders of magnitude faster than previous methods. Conclusions The presented method's sublinear expected running time makes it well suited for RNA sequence-structure pattern matching in large sequence databases. RNA molecules containing several stem-loop substructures can be described by multiple sequence-structure patterns and their matches are efficiently handled by a novel chaining method. Beyond our algorithmic contributions, we provide with Structator a complete and robust open-source software solution for index-based search of RNA sequence-structure patterns. The Structator software is available at http://www.zbh.uni-hamburg.de/Structator webcite.Deutsche Forschungsgemeinschaft (grant WI 3628/1-1

DSpace@MIT

Crossref

Springer - Publisher Connector

PubMed Central

Publications at Bielefeld University

Ergonomic design of user guides in multimedia environments with remote controls and onscreen displays

Author: Kurtz Peter
Lutherdt Stefan
Publication venue
Publication date: 14/11/2008
Field of study

During a project period of three years a new type of remote control and onscreen display was developed after a process of compares and analysis of present remote controls and multimedia devices. This project was initiated by a German producer of consumer electronics. The usability and user acceptance was tested and added by questionnaires. The characteristic of this system is a remote control with only one control element and an according concertedly developed onscreen display. This new onscreen display is marked that the motion of the thumb on the surface of the sensor pad produces a conformable motion inside the display. All operational functions are integrated in 4 menus at both sides, top and bottom of the screen. The user testing had shown that haptic elements are well suitable to fulfil the requirements of supporting the user by imprinted user routines and avoidance of visual control of the usage

Digitale Bibliothek Thüringen